================================================================================
README: Matrix Product Approximation - Figure 3: Scalability Analysis
================================================================================

### Project Title

Matrix Product Approximation - Experiment 3: Scalability Analysis

### Objective

This experiment investigates the scalability of various matrix product approximation algorithms and their theoretical error bounds with respect to the common dimension 'n' of the input matrices A (m x n) and B (p x n). The goal is to observe how the empirical approximation errors and computation times of different methods change as 'n' increases, while 'm', 'p', and the ratio k/n (where 'k' is the sketch size or number of selected columns/features) are kept constant. This experiment focuses on the performance of algorithms and bounds without direct comparison to the optimal SVD-based solution (v_k^*), which can be computationally prohibitive for large 'n'.

### File Structure

The project consists of the following main components:

1.  `matrix_product_approximations_exp3.py`:
    *   This Python script contains the core library functions for Experiment 3.
    *   It includes:
        *   Helper functions (e.g., Frobenius norm, rho_G calculation).
        *   Matrix generation utilities.
        *   Implementations of theoretical bound calculations (user-defined and standard bounds).
        *   Implementations of various matrix product approximation algorithms:
            *   Optimal Leverage Score Sampling
            *   CountSketch
            *   Subsampled Randomized Hadamard Transform (SRHT)
            *   Gaussian Projection
            *   Greedy Orthogonal Matching Pursuit (OMP) style selection
        *   The main experiment execution logic (`run_experiment_3_scalability`).
        *   Plotting functions (`plot_experiment_3_scalability`) to visualize the results.
    *   It also handles global plotting styles and directory creation for outputs.

2.  `run_experiment3.py`:
    *   This is the main executable Python script for running Experiment 3.
    *   It imports necessary functions from `matrix_product_approximations_exp3.py`.
    *   Users can easily modify experiment parameters (like the range of 'n' values, m, p, k/n ratio, number of trials, matrix distribution) within this file.
    *   It calls the experiment running function, saves the collected data to a JSON file, and then generates plots.

3.  `plots/` (directory):
    *   This directory is automatically created when the scripts are run.
    *   It stores the output plots generated by `plot_experiment_3_scalability`.
    *   Example filenames:
        *   `Exp3_Scalability_Final_Gaussian_m50_p30_kratio0pt2_main_nolegend.png`
        *   `Exp3_Scalability_Final_Gaussian_m50_p30_kratio0pt2_rho_G.png`

4.  `results/` (directory):
    *   This directory is automatically created.
    *   It stores the raw numerical results of the experiment in JSON format.
    *   Example filename: `Exp3_Scalability_Gaussian_m50_p30_kratio0pt2_results.json`

### Setup

1.  **Python**: Ensure you have Python 3.x installed.
2.  **Libraries**: Install the required Python libraries. You can typically install them using pip:
    ```bash
    pip install numpy scipy matplotlib cvxpy pandas tqdm
    ```
    *   `numpy`: For numerical operations and array manipulation.
    *   `scipy`: For scientific and technical computing (used for some linear algebra).
    *   `matplotlib`: For generating plots.
    *   `cvxpy`: For convex optimization (used in one of the theoretical bounds).
    *   `pandas`: Used for data structuring (though less prominent in direct output for Exp3).
    *   `tqdm`: For displaying progress bars during long computations.

### How to Run Experiment 3

1.  **Navigate to Directory**: Open your terminal or command prompt and navigate to the directory where you have saved `matrix_product_approximations_exp3.py` and `run_experiment3.py`.

2.  **Modify Parameters (Optional)**:
    Open `run_experiment3.py` in a text editor. You can adjust the following parameters at the beginning of the `if __name__ == "__main__":` block:
    *   `N_VALUES_EXP3_RUN`: A list of 'n' (common dimension) values to test (e.g., `[200, 400, 800, 1500, 2500, 4000, 6000, 8000]`).
    *   `M_DIM_EXP3_RUN`: The number of rows for matrix A (e.g., `50`).
    *   `P_DIM_EXP3_RUN`: The number of rows for matrix B (e.g., `30`).
    *   `K_RATIO_EXP3_RUN`: The ratio k/n. The sketch size 'k' will be calculated as `int(n * K_RATIO_EXP3_RUN)` for each 'n' (e.g., `0.2`).
    *   `N_TRIALS_EXP3_RUN`: The number of times randomized algorithms are run for each 'n' to average their performance (e.g., `3`).
    *   `BASE_SEED_EXP3_RUN`: The base seed for random number generation to ensure reproducibility (e.g., `2025`).
    *   `DISTRIBUTION_TYPE_EXP3_RUN`: The distribution used for generating matrix entries. Options: `'gaussian'` or `'uniform'` (e.g., `'gaussian'`).

3.  **Execute Script**: Run the main script from your terminal:
    ```bash
    python run_experiment3.py
    ```

    The script will print progress updates to the console. Depending on the parameters (especially the largest 'n' and `N_TRIALS_EXP3_RUN`), the experiment might take some time to complete.

### Output

After the script finishes, you will find:

1.  **Plots**: In the `plots/` directory.
    *   One main plot showing relative squared errors and bound values vs. 'n' on one subplot, and computation times vs. 'n' on another subplot. Both axes are typically on a log scale.
    *   A separate plot showing the calculated `rho_G` values vs. 'n', if `rho_G` could be computed.
    The filenames will reflect the parameters used (e.g., distribution, m, p, k_ratio).

2.  **JSON Data**: In the `results/` directory.
    *   A `.json` file containing the detailed numerical results. This file stores a dictionary where keys are the 'n' values. Each 'n' value maps to another dictionary containing:
        *   `k`: The sketch size used for that 'n'.
        *   `rho_G`: The calculated `rho_G` value.
        *   `matrix_dist_type`: The matrix distribution used.
        *   `m_dim`, `p_dim`: Dimensions m and p.
        *   `results`: A sub-dictionary with relative squared errors for algorithms and values for bounds.
        *   `times`: A sub-dictionary with computation times for each algorithm and bound.

### Interpreting Results

*   **Plots**:
    *   **Main Plot (Errors/Bounds & Times)**:
        *   *Left Subplot (Errors/Bounds vs. n)*: Observe how the relative squared errors of different approximation algorithms (e.g., 'Optimal Sampling', 'CountSketch', 'SRHT') and the values of theoretical bounds (e.g., 'Bound (QP Aux)', 'Bound (Sampling)') scale with increasing 'n'. Lower lines indicate better approximation or tighter bounds.
        *   *Right Subplot (Times vs. n)*: Analyze the computational cost (time in seconds) of each method as 'n' increases. This helps understand the practical scalability of each approach.
    *   **Rho_G Plot**: This plot shows how the `rho_G` metric (related to the alignment of column norms of A and B) changes with 'n' for the generated matrices. `rho_G` can influence the performance of some bounds and algorithms.

*   **JSON File**:
    *   The JSON file provides the raw data used to generate the plots. It can be used for more detailed numerical analysis or for creating custom plots.
    *   Values for errors and bounds are typically relative squared errors (||AB^T - Approx(AB^T)||_F^2 / ||AB^T||_F^2) or bound values normalized by ||AB^T||_F^2.

### Code Structure Overview

*   **`matrix_product_approximations_exp3.py`**:
    *   **Global Settings**: `plt.rcParams` for Matplotlib, `ADAPTED_EXPERIMENT_STYLES` for plot aesthetics.
    *   **Helper Functions**: `frob_norm_sq`, `calculate_rho_g`.
    *   **Matrix Generation**: `generate_matrices` (supports Gaussian, uniform distributions, noise, cancellation).
    *   **Theoretical Bounds**:
        *   `compute_theoretical_bounds`: Calculates user-provided bounds (Binary, QP Analytical, QP CVXPY).
        *   `compute_standard_bounds`: Calculates standard bounds (Leverage Score Expectation, Simple Sketching).
    *   **Algorithm Implementations**:
        *   `run_leverage_score_sampling`
        *   `run_countsketch`
        *   `run_gaussian_projection`
        *   `run_greedy_selection_omp`
        *   `fast_walsh_hadamard_transform_manual` (helper for SRHT)
        *   `pad_matrix` (helper for SRHT)
        *   `run_srht_new` (SRHT implementation)
    *   **Experiment Runner**: `run_experiment_3_scalability` orchestrates the trials for different 'n', calls algorithms and bound computations, and collects results.
    *   **Plotting**: `plot_experiment_3_scalability` takes the collected results and generates the summary plots.

*   **`run_experiment3.py`**:
    *   Sets high-level parameters for Experiment 3.
    *   Calls `run_experiment_3_scalability` from the library file.
    *   Handles saving the results to a JSON file.
    *   Calls `plot_experiment_3_scalability` to generate and save plots.

This structure allows for separation of concerns: the library contains the core logic, while the main script handles execution and parameterization for a specific experimental run.